Clifton
Robust Bayesian Optimisation with Unbounded Corruptions
Ezzerg, Abdelhamid, Bogunovic, Ilija, Knoblauch, Jeremias
Bayesian Optimization is critically vulnerable to extreme outliers. Existing provably robust methods typically assume a bounded cumulative corruption budget, which makes them defenseless against even a single corruption of sufficient magnitude. To address this, we introduce a new adversary whose budget is only bounded in the frequency of corruptions, not in their magnitude. We then derive RCGP-UCB, an algorithm coupling the famous upper confidence bound (UCB) approach with a Robust Conjugate Gaussian Process (RCGP). We present stable and adaptive versions of RCGP-UCB, and prove that they achieve sublinear regret in the presence of up to $O(T^{1/2})$ and $O(T^{1/3})$ corruptions with possibly infinite magnitude. This robustness comes at near zero cost: without outliers, RCGP-UCB's regret bounds match those of the standard GP-UCB algorithm.
- North America > United States > New Jersey > Passaic County > Clifton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
Ontology Creation and Management Tools: the Case of Anatomical Connectivity
Kokash, Natallia, de Bono, Bernard, Gillespie, Tom
Ontologies are essential for developing standardized vocabularies and defining relationships that help describe and interpret data from diverse sources. They are crucial for achieving semantic interoperability in many domains, allowing different systems to exchange data with a consistent and shared meaning. Ontologies are extensively used in biological and biomedical research Hoehndorf et al. (2015); Antezana et al. (2009), due to their ability to: provide standard identifiers for classes and relationships representing complex phenomena; include metadata to clarify the intended meaning of classes and relationships; include machine-readable definitions that allow computational access to class properties and relationships; standardize vocabulary across multiple data sources. Ontology-based data integration plays a vital role in neuroscience, where researchers synthesize knowledge across physiology, anatomy, molecular and developmental biology, cytology, and mathematical modeling to support accurate data representation, analysis, and simulation. A common challenge for many large neuroscience projects is the integration of data across a wide diversity of species, spatial resolutions, and temporal scales.
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > New Jersey > Passaic County > Clifton (0.04)
- North America > United States > California (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.86)
In silico study on the cytotoxicity against Hela cancer cells of xanthones bioactive compounds from Garcinia cowa: QSAR based on Graph Deep Learning, Network Pharmacology, and Molecular Docking
Son, Nguyen Manh, Vang, Pham Huu, Dung, Nguyen Thi, Thao, Nguyen Manh Ha. Ta Thi, Thuy, Tran Thi Thu, Giang, Phan Minh
Institute of Natural Products Chemistry, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Nighiado, Cau Giay, Hanoi, Vietnam Abstract: Cancer is recognized as a complex group of diseases, contributing to the highest global mortality rates, with increasing prevalence and a trend toward affecting younger populations. It is characterized by uncontrolled proliferation of abnormal cells, invasion of adjacent tissues, and metastasis to distant organs. Garcinia cowa, a traditional medicinal plant widely used in Southeast Asia, including Vietnam, is employed to treat fever, cough, indigestion, as a laxative, and for parasitic diseases. Numerous xanthone compounds isolated from this species exhibit a broad spectrum of biological activities, with some showing promise as anti-cancer and antimalarial agents. Network pharmacology analysis successfully identified key bioactive compounds Rubraxanthone, Garcinone D, Norcowanin, Cowanol, and Cowaxanthone--alongside their primary protein targets (TNF, CTNNB1, SRC, NFKB1, and MTOR), providing critical insights into the molecular mechanisms underlying their anti-cancer effects. The Graph Attention Network algorithm demonstrated superior predictive performance, achieving an R of 0.98 and an RMSE of 0.02 after data augmentation, highlighting its accuracy in predicting pIC50 values for xanthone-based compounds. Additionally, molecular docking revealed MTOR as a potential target for inducing cytotoxicity in HeLa cancer cells from Garcinia cowa. Keywords: Garcinia cowa, Hela, Network pharmacology, Graph neural network, Molecular docking I. Introduction Cancer is a complex group of diseases and one of the leading causes of mortality worldwide, characterized by the uncontrolled proliferation of abnormal cells, the ability to invade adjacent tissues, and metastasis to distant organs in the body [1, 2].
- Asia > Vietnam > Hanoi > Hanoi (0.24)
- Asia > Southeast Asia (0.24)
- Asia > Thailand (0.04)
- (4 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
Sequence-based protein-protein interaction prediction and its applications in drug discovery
Charih, François, Green, James R., Biggar, Kyle K.
Aberrant protein-protein interactions (PPIs) underpin a plethora of human diseases, and disruption of these harmful interactions constitute a compelling treatment avenue. Advances in computational approaches to PPI prediction have closely followed progress in deep learning and natural language processing. In this review, we outline the state-of-the-art for sequence-based PPI prediction methods and explore their impact on target identification and drug discovery. We begin with an overview of commonly used training data sources and techniques used to curate these data to enhance the quality of the training set. Subsequently, we survey various PPI predictor types, including traditional similarity-based approaches, and deep learning-based approaches with a particular emphasis on the transformer architecture. Finally, we provide examples of PPI prediction in systems-level proteomics analyses, target identification, and design of therapeutic peptides and antibodies. We also take the opportunity to showcase the potential of PPI-aware drug discovery models in accelerating therapeutic development.
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- North America > United States > New York > New York County > New York City (0.14)
- North America > Cuba (0.04)
- (7 more...)
- Overview (0.87)
- Research Report (0.82)
Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence
Sun, Yingying, A, Jun, Liu, Zhiwei, Sun, Rui, Qian, Liujia, Payne, Samuel H., Bittremieux, Wout, Ralser, Markus, Li, Chen, Chen, Yi, Dong, Zhen, Perez-Riverol, Yasset, Khan, Asif, Sander, Chris, Aebersold, Ruedi, Vizcaíno, Juan Antonio, Krieger, Jonathan R, Yao, Jianhua, Wen, Han, Zhang, Linfeng, Zhu, Yunping, Xuan, Yue, Sun, Benjamin Boyang, Qiao, Liang, Hermjakob, Henning, Tang, Haixu, Gao, Huanhuan, Deng, Yamin, Zhong, Qing, Chang, Cheng, Bandeira, Nuno, Li, Ming, E, Weinan, Sun, Siqi, Yang, Yuedong, Omenn, Gilbert S., Zhang, Yue, Xu, Ping, Fu, Yan, Liu, Xiaowen, Overall, Christopher M., Wang, Yu, Deutsch, Eric W., Chen, Luonan, Cox, Jürgen, Demichev, Vadim, He, Fuchu, Huang, Jiaxing, Jin, Huilin, Liu, Chao, Li, Nan, Luan, Zhongzhi, Song, Jiangning, Yu, Kaicheng, Wan, Wanggen, Wang, Tai, Zhang, Kang, Zhang, Le, Bell, Peter A., Mann, Matthias, Zhang, Bing, Guo, Tiannan
Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights. These include developing an AI-friendly ecosystem for proteomics data generation, sharing, and analysis; improving peptide and protein identification and quantification; characterizing protein-protein interactions and protein complexes; advancing spatial and perturbation proteomics; integrating multi-omics data; and ultimately enabling AI-empowered virtual cells.
- Europe > United Kingdom (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- Asia > China > Beijing > Beijing (0.05)
- (19 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.67)
Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches
Liu, Xuefeng, Jiang, Songhao, Duan, Xiaotian, Vasan, Archit, Liu, Chong, Tien, Chih-chan, Ma, Heng, Brettin, Thomas, Xia, Fangfang, Foster, Ian T., Stevens, Rick L.
Protein-ligand binding [Clyde et al., 2023] refers to the process as shown in Figure 1 by which ligands--usually small molecules, ions, or proteins--generate signals by binding to the active sites of target proteins through intermolecular forces. This binding typically changes the conformation of target proteins, which then results in the realization, modulation, or alteration of protein functions. Therefore, protein-ligand binding plays a central role in most, if not all, important life processes. For example, oxygen molecules are bound and carried through the human body by proteins like hemoglobin, and then utilized for energy production, while nonsteroidal anti-inflammatory drugs (NSAIDs) like ibuprofen work by inhibiting the functionality of the cyclooxygenase (COX) enzyme that thus reducing the release of pain-causing substances in the body. The concept and importance of binding affinity prediction were first addressed in Böhm [1994]: given the 3D structures of a target protein and a potential ligand, the objective is to predict the binding constant of such a complex, along with the most probable binding pose candidates. The prediction of the binding site (the set of protein residues that have at least one non-hydrogen atom within 4.0 Å of a ligand's non-hydrogen atom [Khazanov and Carlson, 2013]) and affinity (binding constants such as inhibition or dissociation constants, or the concentration at 50% inhibition) are usually divided into two separate but related stages [Ballester and Mitchell, 2010a]. One notable motivation for constructing a good binding affinity predictor (or scoring function, as called in some earlier work) is the essential role that it plays in drug discovery [Liu et al., 2023, 2024a] and virtual screening [Meng et al., 2011, Pinzi and Rastelli, 2019, Sadybekov and Katritch, 2023]. Traditional drug discovery essentially involves a process of trial and error.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- (5 more...)
Computing in the Life Sciences: From Early Algorithms to Modern AI
Donkor, Samuel A., Walsh, Matthew E., Titus, Alexander J.
Computing in the life sciences has undergone a transformative evolution, from early computational models in the 1950s to the applications of arti cial intelligence (AI) and machine learning (ML) seen today. This paper highlights key milestones and technological advancements through the historical development of computing in the life sciences. The discussion includes the inception of computational models for biological processes, the advent of bioinformatics tools, and the integration of AI/ML in modern life sciences research. Attention is given to AI-enabled tools used in the life sciences, such as scienti c large language models and bio-AI tools, examining their capabilities, limitations, and impact to biological risk. This paper seeks to clarify and establish essential terminology and concepts to ensure informed decision-making and e ective communication across disciplines. The views and opinions expressed within this manuscript are those of the authors and do not necessarily re ect the views and opinions of any organization the authors are a liated with.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (6 more...)
FraGNNet: A Deep Probabilistic Model for Mass Spectrum Prediction
Young, Adamo, Wang, Fei, Wishart, David, Wang, Bo, Röst, Hannes, Greiner, Russ
The process of identifying a compound from its mass spectrum is a critical step in the analysis of complex mixtures. Typical solutions for the mass spectrum to compound (MS2C) problem involve matching the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to mass spectrum (C2MS) models can improve retrieval rates by augmenting real libraries with predicted spectra. Unfortunately, many existing C2MS models suffer from problems with prediction resolution, scalability, or interpretability. We develop a new probabilistic method for C2MS prediction, FraGNNet, that can efficiently and accurately predict high-resolution spectra. FraGNNet uses a structured latent space to provide insight into the underlying processes that define the spectrum. Our model achieves state-of-the-art performance in terms of prediction error, and surpasses existing C2MS models as a tool for retrieval-based MS2C.
- North America > Canada > Ontario > Toronto (0.14)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > New Jersey > Passaic County > Clifton (0.04)
Machine learning applied to omics data
Calviño, Aida, Moreno-Ribera, Almudena, Pineda, Silvia
In this chapter we illustrate the use of some Machine Learning techniques in the context of omics data. More precisely, we review and evaluate the use of Random Forest and Penalized Multinomial Logistic Regression for integrative analysis of genomics and immunomics in pancreatic cancer. Furthermore, we propose the use of association rules with predictive purposes to overcome the low predictive power of the previously mentioned models. Finally, we apply the reviewed methods to a real data set from TCGA made of 107 tumoral pancreatic samples and 117,486 germline SNPs, showing the good performance of the proposed methods to predict the immunological infiltration in pancreatic cancer.
- Europe > Spain > Galicia > Madrid (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (5 more...)
- Research Report > Experimental Study (0.66)
- Research Report > New Finding (0.66)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.95)
- Health & Medicine > Therapeutic Area > Oncology > Pancreatic Cancer (0.54)
MassFormer: Tandem Mass Spectrum Prediction for Small Molecules using Graph Transformers
Young, Adamo, Wang, Bo, Röst, Hannes
Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule. Although mass spectrometry is applied in many areas, the vast majority of small molecules lack experimental reference spectra. For over seventy years, spectrum prediction has remained a key challenge in the field. Existing deep learning methods do not leverage global structure in the molecule, potentially resulting in difficulties when generalizing to new data. In this work we propose a new model, MassFormer, for accurately predicting tandem mass spectra. MassFormer uses a graph transformer architecture to model long-distance relationships between atoms in the molecule. The transformer module is initialized with parameters obtained through a chemical pre-training task, then fine-tuned on spectral data. MassFormer outperforms competing approaches for spectrum prediction on multiple datasets, and is able to recover prior knowledge about the effect of collision energy on the spectrum. By employing gradient-based attribution methods, we demonstrate that the model can identify relationships between fragment peaks. To further highlight MassFormer's utility, we show that it can match or exceed existing prediction-based methods on two spectrum identification tasks. We provide open-source implementations of our model and baseline approaches, with the goal of encouraging future research in this area.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New Jersey > Passaic County > Clifton (0.04)
- Europe > Montenegro (0.04)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)